A Note on Local Ultrametricity in Text
نویسنده
چکیده
High dimensional, sparsely populated data spaces have been characterized in terms of ultrametric topology. This implies that there are natural, not necessarily unique, tree or hierarchy structures de ned by the ultrametric topology. In this note we study the extent of local ultrametric topology in texts, with the aim of nding unique \ ngerprints" for a text or corpus, discriminating between texts from di erent domains, and opening up the possibility of exploiting hierarchical structures in the data. We use coherent and meaningful collections of over 1000 texts, comprising over 1.3 million words.
منابع مشابه
Ultrametricity in Data: Identifying and Exploiting Local and Global Hierarchical Structure
We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals.
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملFrom Data to the Physics Using Ultrametrics: New Results in High Dimensional Data Analysis
We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, in time series signals, and in other areas. We conc...
متن کاملIdentifying and Exploiting Ultrametricity
We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals. In our presentation we ...
متن کاملUltrametric Model of Mind, II: Application to Text Content Analysis
In a companion paper, Murtagh (2012), we discussed howMatte Blanco’s work linked the unrepressed unconscious (in the human) to symmetric logic and thought processes. We showed how ultrametric topology provides a most useful representational and computational framework for this. Now we look at the extent to which we can find ultrametricity in text. We use coherent and meaningful collections of n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/cs/0701181 شماره
صفحات -
تاریخ انتشار 2005